History Based Unsupervised Data Oriented Parsing
نویسندگان
چکیده
Grammar induction is a basic step in natural language processing. Based on the volume of information that is used by different methods, we can distinguish three types of grammar induction method: supervised, unsupervised, and semi-supervised. Supervised and semisupervised methods require large tree banks, which may not currently exist for many languages. Accordingly, many researchers have focused on unsupervised methods. Unsupervised Data Oriented Parsing (UDOP) is currently the state of the art in unsupervised grammar induction. In this paper, we show that the performance of UDOP in free word order languages such as Persian is inferior to that of fixed order languages such as English. We also introduce a novel approach called History-based unsupervised data oriented Parsing, and show that the performance of UDOP can be significantly improved by using some history information, especially in dealing with free word order languages.
منابع مشابه
Unsupervised - oriented Hits Appraisal
When literature-based evidences or explanations are needed, it is possible use unsupervised-oriented hits appraisal. Unsupervised-oriented hits appraisal allows us to explore hits (e.g., research papers) open mind and it helps analyze the contents of the selection. Outcomes of the unsupervised-oriented hits appraisal are associations, clusters or topics that are produced unsupervised by the tec...
متن کاملA Linguistic Investigation into Unsupervised DOP
Unsupervised Data-Oriented Parsing models (U-DOP) represent a class of structure bootstrapping models that have achieved some of the best unsupervised parsing results in the literature. While U-DOP was originally proposed as an engineering approach to language learning (Bod 2005, 2006a), it turns out that the model has a number of properties that may also be of linguistic and cognitive interest...
متن کاملA U - DOP approach to modeling language acquisition
In linguistics, there is a debate between empiricists and nativists: the former believe that language is acquired from experience, the latter that there is an innate component for language. The main arguments adduced by nativists are Arguments from Poverty of Stimulus. It is claimed that children acquire certain phenomena, which they cannot learn on the basis of experience alone —and therefore,...
متن کاملAutomating Construction Work Data-Oriented Parsing and Constructivist Accounts of Language Acquisition
The constructionist approach to language has long proven its merits as a theoretical framework guiding linguistic observations. However, relatively little work has been dedicated to providing a precise, formalized definition of constructions and the mechanisms by means of which they are acquired. In giving an overview of recent work in Data-Oriented Parsing (DOP), we show how the theoretical de...
متن کاملCorpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency
We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best published figures for unsupervised dependency parsing and unsupervised constituency parsing. We als...
متن کامل